MacSaar at SemEval-2016 Task 11: Zipfian and Character Features for ComplexWord Identification

نویسندگان

  • Marcos Zampieri
  • Liling Tan
  • Josef van Genabith
چکیده

This paper presents the MacSaar system developed to identify complex words in English texts. MacSaar participated in the SemEval 2016 task 11: Complex Word Identification submitting two runs. The system is based on the assumption that complex words are likely to be less frequent and on average longer than words considered to be simple. We report results of 82.5% accuracy and 27% F-Score using a Random Forest Classifier. The best MacSaar submission was ranked 8th in terms of FMeasure among 45 entries.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

UWB at SemEval-2016 Task 11: Exploring Features for Complex Word Identification

In this paper, we present our system developed for the SemEval 2016 Task 11: Complex Word Identification. Our team achieved the 3rd place among 21 participants. Our systems ranked 4th and 13th among 42 submitted systems. We proposed multiple features suitable for complex word identification, evaluated them, and discussed their properties. According to the results of our experiments, our final s...

متن کامل

CLaC at SemEval-2016 Task 11: Exploring linguistic and psycho-linguistic Features for Complex Word Identification

This paper describes the system deployed by the CLaC-EDLK team to the SemEval 2016, Complex Word Identification task. The goal of the task is to identify if a given word in a given context is simple or complex. Our system relies on linguistic features and cognitive complexity. We used several supervised models, however the Random Forest model outperformed the others. Overall our best configurat...

متن کامل

USAAR at SemEval-2016 Task 11: Complex Word Identification with Sense Entropy and Sentence Perplexity

This paper describes an information-theoretic approach to complex word identification using a classifier based on an entropy based measure based on word senses and sentence-level perplexity features. We describe the motivation behind these features based on information density and demonstrate that they perform modestly well in the complex word identification task in SemEval-2016. We also discus...

متن کامل

LTG at SemEval-2016 Task 11: Complex Word Identification with Classifier Ensembles

We present the description of the LTG entry in the SemEval-2016 Complex Word Identification (CWI) task, which aimed to develop systems for identifying complex words in English sentences. Our entry focused on the use of contextual language model features and the application of ensemble classification methods. Both of our systems achieved good performance, ranking in 2nd and 3rd place overall in ...

متن کامل

MAZA at SemEval-2016 Task 11: Detecting Lexical Complexity Using a Decision Stump Meta-Classifier

This paper describes team MAZA entries for the 2016 SemEval Task 11: Complex Word Identification (CWI). The task is a binary classification task in which systems are trained to predict whether a word in a sentence is considered to be complex or not. We developed our two systems for this task based on classifier stacking using decision stumps and decision trees. Our best system, using contextual...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016